Goto

Collaborating Authors

 image classification model



Hardware Resilience Properties of Text-Guided Image Classifiers

Neural Information Processing Systems

This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable $5.5\times$ average increase in hardware reliability (and up to $14\times$) across various architectures in the most critical layer, with minimal accuracy drop ($0.3\%$ on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain.



Sensitivity Analysis of Image Classification Models using Generalized Polynomial Chaos

Bahr, Lukas, Poßner, Lucas, Weise, Konstantin, Gröger, Sophie, Daub, Rüdiger

arXiv.org Artificial Intelligence

Integrating advanced communication protocols in production has accelerated the adoption of data-driven predictive quality methods, notably machine learning (ML) models. However, ML models in image classification often face significant uncertainties arising from model, data, and domain shifts. These uncertainties lead to overconfidence in the classification model's output. To better understand these models, sensitivity analysis can help to analyze the relative influence of input parameters on the output. This work investigates the sensitivity of image classification models used for predictive quality. We propose modeling the distributional domain shifts of inputs with random variables and quantifying their impact on the model's outputs using Sobol indices computed via generalized polynomial chaos (GPC). This approach is validated through a case study involving a welding defect classification problem, utilizing a fine-tuned ResNet18 model and an emblem classification model used in BMW Group production facilities.


Hardware Resilience Properties of Text-Guided Image Classifiers

Neural Information Processing Systems

This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable 5.5\times average increase in hardware reliability (and up to 14\times) across various architectures in the most critical layer, with minimal accuracy drop ( 0.3\% on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain.


Hardware Resilience Properties of Text-Guided Image Classifiers

Neural Information Processing Systems

This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable 5.5\times average increase in hardware reliability (and up to 14\times) across various architectures in the most critical layer, with minimal accuracy drop ( 0.3\% on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain.


Exploring Secure Machine Learning Through Payload Injection and FGSM Attacks on ResNet-50

Yadav, Umesh, Niraula, Suman, Gupta, Gaurav Kumar, Yadav, Bicky

arXiv.org Artificial Intelligence

As ML models continue to integrate into critical cybersecurity In the modern landscape of cybersecurity, machine learning systems, the ability to exploit these models through (ML) models, especially in areas like image classification, adversarial techniques poses significant threats. A study predicts are increasingly integrated into systems where robustness and that by 2025, 30% of cyberattacks will involve adversarial security are paramount. However, these models are highly machine-learning tactics[16]. Pre-trained models are susceptible to adversarial attacks, where small, crafted perturbations susceptible to perturbations adversarial attacks, which can can lead to incorrect predictions and, in more severe undermine trust in AI systems due to the lack of customized cases, unauthorized access or manipulation of systems[2].


Residual Feature-Reutilization Inception Network for Image Classification

He, Yuanpeng, Song, Wenjie, Li, Lijian, Zhan, Tianxiang, Jiao, Wenpin

arXiv.org Artificial Intelligence

Generally, deep learning has contributed to this field a lot. The most representative deep neural network architectures in computer vision can be roughly divided into transformer-based and CNN-based models. Transformer is originally proposed for natural language processing, which has been transferred to vision tasks and achieves considerably satisfying performance recently. Specifically, vision transformer [1] first introduces attention mechanism into computer vision whose strategy of information interaction enlargers the effective receptive field of related models observably so that crucial information can be better obtained. Due to efficiency of this architecture, the variations of transformer are devised corresponding to specific demands, and there are two main categories in the thoughts about improvements on the variations, namely integration of transformer framework with other models which are for particular usages and modifications on the original architecture. With respect to the former, DS-TransUNet [2] is a typical example, which synthesizes dual transformer-based architectures and U-Net to realize a breakthrough in medical image segmentation. Besides, some works focus on improvements on architecture of transformer, for instance, Mix-ViT [3] tries to design a mix attention mechanism to create more sufficient passages for information interaction.


Moving Healthcare AI-Support Systems for Visually Detectable Diseases onto Constrained Devices

Watt, Tess, Chrysoulas, Christos, Barclay, Peter J

arXiv.org Artificial Intelligence

Image classification usually requires connectivity and access to the cloud which is often limited in many parts of the world, including hard to reach rural areas. TinyML aims to solve this problem by hosting AI assistants on constrained devices, eliminating connectivity issues by processing data within the device itself, without internet or cloud access. This pilot study explores the use of tinyML to provide healthcare support with low spec devices in low connectivity environments, focusing on diagnosis of skin diseases and the ethical use of AI assistants in a healthcare setting. To investigate this, 10,000 images of skin lesions were used to train a model for classifying visually detectable diseases (VDDs). The model weights were then offloaded to a Raspberry Pi with a webcam attached, to be used for the classification of skin lesions without internet access. It was found that the developed prototype achieved a test accuracy of 78% and a test loss of 1.08.


CrisisViT: A Robust Vision Transformer for Crisis Image Classification

Long, Zijun, McCreadie, Richard, Imran, Muhammad

arXiv.org Artificial Intelligence

In times of emergency, crisis response agencies need to quickly and accurately assess the situation on the ground in order to deploy relevant services and resources. However, authorities often have to make decisions based on limited information, as data on affected regions can be scarce until local response services can provide first-hand reports. Fortunately, the widespread availability of smartphones with high-quality cameras has made citizen journalism through social media a valuable source of information for crisis responders. However, analyzing the large volume of images posted by citizens requires more time and effort than is typically available. To address this issue, this paper proposes the use of state-of-the-art deep neural models for automatic image classification/tagging, specifically by adapting transformer-based architectures for crisis image classification (CrisisViT). We leverage the new Incidents1M crisis image dataset to develop a range of new transformer-based image classification models. Through experimentation over the standard Crisis image benchmark dataset, we demonstrate that the CrisisViT models significantly outperform previous approaches in emergency type, image relevance, humanitarian category, and damage severity classification. Additionally, we show that the new Incidents1M dataset can further augment the CrisisViT models resulting in an additional 1.25% absolute accuracy gain.